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Abstract 

Background: The regulation of gene expression by transcription factors is a key determinant of cellular phenotypes. 
Deciphering genome-wide networks that capture which transcription factors regulate which genes is one of the 
major efforts towards understanding and accurate modeling of living systems. However, reverse-engineering the 
network from gene expression profiles remains a challenge, because the data are noisy, high dimensional and sparse, 
and the regulation is often obscured by indirect connections. 

Results: We introduce a gene regulatory network inference algorithm ENNET, which reverse-engineers networks of 
transcriptional regulation from a variety of expression profiles with a superior accuracy compared to the 
state-of-the-art methods. The proposed method relies on the boosting of regression stumps combined with a relative 
variable importance measure for the initial scoring of transcription factors with respect to each gene. Then, we 
propose a technique for using a distribution of the initial scores and information about knockouts to refine the 
predictions. We evaluated the proposed method on the DREAM3, DREAM4 and DREAM5 data sets and achieved 
higher accuracy than the winners of those competitions and other established methods. 

Conclusions: Superior accuracy achieved on the three different benchmark data sets shows that ENNET is a top 
contender in the task of network inference. It is a versatile method that uses information about which gene was 
knocked-out in which experiment if it is available, but remains the top performer even without such information. 
ENNET is available for download from https://github.com/slawekj7ennet under the GNU GPLv3 license. 
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Background 

Regulation of gene expression is a key driver of adaptation 
of living systems to changes in the environment and to 
external stimuli. Abnormalities in this highly coordinated 
process underlie many pathologies. At the transcription 
level, the control of the amount of mRNA transcripts 
involves epigenetic factors such as DNA methylation and, 
in eukaryotes, chromatin remodeling. But the key role 
in both prokaryotes and eukaryotes is played by tran- 
scription factors (TF), that is, proteins that can bind 
to DNA in the regulatory regions of specific genes and 
act as repressors or inducers of their expression. Many 
interactions between transcription factors and genes they 
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regulate have been discovered through traditional molec- 
ular biology experiments. With the introduction of high- 
throughput experimental techniques for measuring gene 
expression, such as DNA microarrays and RNA-Seq, the 
goal moved to reverse-engineering genome-wide gene 
regulatory networks (GRNs) [1]. Knowledge of GRNs 
can facilitate finding mechanistic hypotheses about dif- 
ferences between phenotypes and sources of pathologies, 
and can help in the drug discovery and bioengineering. 

High throughput techniques allow for collecting 
genome-wide snapshots of gene expression across dif- 
ferent experiments, such as diverse treatments or other 
perturbations to cells [2]. Analyzing these data to infer 
the regulatory network is one of the key challenges in 
the computational systems biology. The difficulty of this 
task arises from the nature of the data: they are typically 
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noisy, high dimensional, and sparse [3]. Moreover, dis- 
covering direct causal relationships between genes in the 
presence of multiple indirect ones is not a trivial task 
given the limited number of knockouts and other con- 
trolled experiments. Attempts to solve this problem are 
motivated from a variety of different perspectives. Most 
existing computational methods are examples of influ- 
ence modeling, where the expression of a target transcript 
is modeled as a function of the expression levels of some 
selected transcription factors. Such a model does not 
aim to describe physical interactions between molecules, 
but instead uses inductive reasoning to find a network of 
dependencies that could explain the regularities observed 
among the expression data. In other words, it does not 
explain mechanistically how transcription factors interact 
with regulated genes, but indicate candidate interactions 
with a strong evidence in expression data. This knowledge 
is crucial to prioritize detailed studies of the mechanics of 
the transcriptional regulation. 

One group of existing methods describes GRN as a sys- 
tem of ordinary differential equations. The rate of change 
in expression of a transcript is given by a function of 
the concentration levels of transcription factors that reg- 
ulate it. Network inference includes two steps: a selection 
of a model and an estimation of its parameters. Popu- 
lar models imply linear functions a priori [4-7]. Bayesian 
Best Subset Regression (BBSR) [8] has been proposed as 
a novel model selection approach, which uses Bayesian 
Information Criterion (BIC) to select an optimal model 
for each target gene. Another group of methods employ 
probabilistic graphical models that analyze multivariate 
joint probability distributions over the observations, usu- 
ally with the use of Bayesian Networks (BN) [9-11], or 
Markov Networks (MN) [12]. Various heuristic search 
schemes have been proposed in order to find parameters 
of the model, such as greedy-hill climbing or the Markov 
Chain Monte Carlo approach [13]. However, because 
learning optimal Bayesian networks from expression data 
is computationally intensive, it remains impractical for 
genome-wide networks. 

Other approaches are motivated from statistics and 
information theory. TwixTwir [14] uses double two-way t- 
test to score transcriptional regulations. The null-mutant 
z-score algorithm [15] scores interactions based on a 
z-score transformed knockout expression matrix. Vari- 
ous algorithms rely on estimating and analyzing cross- 
correlation and mutual information (MI) of gene expres- 
sion in order to construct a GRN [16-20], including 
ANOVA rj 2 method [21]. Improvements aimed at remov- 
ing indirect edges from triples of genes have been pro- 
posed, including techniques such as the Data Processing 
Inequality in ARACNE [22,23], and the adaptive back- 
ground correction in CLR [24]. Another method, NAR- 
ROMI [25], eliminates redundant interactions from the 



MI matrix by applying ODE-based recursive optimization, 
which involves solving a standard linear programming 
model. 

Recently, machine-learning theory has been used to for- 
mulate the network inference problem as a series of super- 
vised gene selection procedures, where each gene in turn 
is designated as the target output. One example is MRNET 
[26], which applies the maximum relevance/minimum 
redundancy (MRMR) [27] principle to rank the set of 
transcription factors according to the difference between 
mutual information with the target transcript (maximum 
relevance) and the average mutual information with all 
the previously ranked transcription factors (minimum 
redundancy). GENIE3 [28] employs Random Forest algo- 
rithm to score important transcription factors, utilizing 
the embedded relative importance measure of input vari- 
ables as a ranking criterion. TIGRESS [29] follows a sim- 
ilar approach but is based on the least angle regression 
(LARS). Recently, boosting [30,31] was also used to score 
the importance of transcription factors, in ADANET [32] 
and OKVAR-Boost [33] methods. 

In this paper, we propose a method that combines 
gradient boosting with regression stumps, augmented 
with statistical re-estimation procedures for prioritizing 
a selected subset of edges based on results from the 
machine-learning models. We evaluated our method on 
the DREAM3, DREAM4 and DREAM5 network inference 
data sets, and achieved results that in all cases were better 
than the currently available methods. 

Methods 

The ENNET algorithm 

Formulating the gene network inference problem 

The proposed algorithm returns a directed graph of regu- 
latory interactions between P genes in form of a weighted 
adjacency matrix V e IR PxP , where vq represents regula- 
tion of gene j by gene /. As an input, it takes gene expres- 
sion data from a set of experiments, together with the 
meta-data describing the conditions of the experiments, 
including which genes were knocked out. Usually, the raw 
expression data need to be pre-processed before any infer- 
ence method could be applied to reverse-engineer a GRN. 
Pre-processing has a range of meanings, here it is regarded 
as a process of reducing variations or artifacts, which are 
not of the biological origin. It is especially important when 
the expression is measured with multiple high-density 
microarrays [34] . Concentration levels of transcripts must 
be adjusted and the entire distribution of adjusted values 
aligned with a normal distribution. Methods for normal- 
ization of expression data are outside of the scope of our 
work. The data we used were already normalized using 
RMA [34,35] by the DREAM challenge organizers. We 
further normalized the expression data to zero mean and 
unit standard deviation. 
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The network inference process relies heavily on the 
type of expression data provided as an input. Two main 
groups of expression profiles are: the one with known, 
and the one with unknown initial perturbation state 
of the expression of genes in the underlying network 
of regulatory interactions. For example, knockout and 
knockdown data are provided with the additional meta- 
data, which describe which genes were initially perturbed 
in each experiment. On the other hand, multifactorial 
and time series data are usually expression profiles of 
an unknown initial state of genes. Wildtype, knockout, 
knockdown, and multifactorial data describe the expres- 
sion of initially perturbed genes, which are however in 
a steady state at the time of measurement, whereas time 
series data describe the dynamics of the expression lev- 
els of initially perturbed genes. The types of data avail- 
able in popular benchmark data sets are summarized in 
Table 1. 

The variability of possible input scenarios poses a prob- 
lem of representing and analyzing expression data. Here, 
we operate on an N x P expression matrix £, where 
is the expression value of the j-th gene in the i-th sample. 
Columns of matrix E correspond to genes, rows corre- 
spond to experiments. We also define a binary perturba- 
tion matrix 7<T, where Iqj is a binary value corresponding 
to the j-th gene in the i-th sample, just like in the matrix 
E. If Iq f j is equal to 1, it means that the j-th gene is known 
to be initially perturbed, for example knocked out, in 
the i-th experiment. Otherwise Iqj is equal to 0. If no 
information is available about knockouts, all values are 
set to 0. 

Decomposing the inference problem into gene selection 
problems 

We decompose the problem of inferring the network 
of regulatory interactions targeting all P genes into P 
independent subproblems. In each subproblem incom- 
ing edges from transcription factors to a single gene 
transcript are discovered. For the k-th decomposed sub- 
problem we create a target expression vector and a 
feature expression matrix X_j < . Columns of theX_j < matrix 

Table 1 Different types of expression data provided in 



popular data sets 



Data set 


WT 


KO 


KD 


MF 


TS 


DREAM3 size 1 00 


• 


• 


• 


o 


• 


DREAM4 size 1 00 


• 


• 


• 


o 


• 


DREAM4 size 1 00 MF 


o 


o 


o 


• 


o 


DREAM5* 


• 


• 


• 


• 


• 



Different types of expression data provided in popular data sets: WT- Wildtype, 
KO- Knockouts, KD- Knockdowns, MF- Multifactorial, TS- Time series, • Available, 
o Unavailable. ★) Even though all the data types are available, they are all 
processed as MF. 



constitute a set of possible transcription factors. Vector Y^ 
corresponds to the expression of the transcript, which is 
possibly regulated by transcription factors from In a 
single gene selection problem we decide which TFs con- 
tribute to the target gene expression across all the valid 
experiments. Columns of correspond to all the possi- 
ble TFs, but if a target gene k is also a transcription factor, 
it is excluded from We do not consider a situation 
in which a transcription factor would have a regulatory 
interaction with itself. When building the target vector Y^ 
corresponding to the k-th target gene, k e {1, we 
consider all the experiments valid except from the ones in 
which the k-th gene was initially perturbed, as specified in 
the perturbation matrix K. We reason that the expression 
value of the k-th gene in those experiments is not deter- 
mined by its TFs, but by the external perturbation. Each 
row in the Y^ vector is aligned with a corresponding row 
in the matrix. In order to justify all the possible inter- 
actions we need to solve a gene selection problem for each 
target gene. For example, if a regulatory network consists 
of four genes (P = 4), we need to solve four gene selec- 
tion problems. In the k-th problem, k e {1, 2, 3, 4}, we find 
which TFs regulate the k-th target gene. In other words, 
we calculate the k-th column of the output adjacency 
matrix V. 

Solving the gene selection problems 

Once the target gene expression vector Y^ and the TF 
expression matrix X_j < are created for each gene /<, we 
solve each /c-th gene selection problem independently, in 
the following way. We search for the subset of columns in 
X_/ < that are related to the target vector Y^ by an unknown 
function ft, as shown in Equation 1, 

V* € {1, P), 3f k : Y k =f k (X_ k ) + € k , (1) 

where 6/ c is a random noise. A function ft represents a 
pattern of regulatory interactions that drive the expres- 
sion of the k-th gene. We want ft to rely only on a small 
number of genes acting as transcription factors, those 
that are the true regulators of gene /c. Essentially, this is 
a feature selection or a gene selection task [28,32,36,37], 
where the goal is to model the target response Y^ with 
an optimal small set of important predictor variables, 
i.e., a subset of columns of the X_£ matrix. A more 
relaxed objective of the gene selection is the variable rank- 
ing, where the relative relevance for all input columns 
of the matrix is obtained with respect to the tar- 
get vector Yj^ The higher a specific column is in that 
ranking, the higher the confidence that a correspond- 
ing TF is in a regulatory interaction with the target 
gene /<. 

Our solution to the variable ranking involves ensemble 
learning. We use an iterative regression method, which 
in each iteration chooses one transcription factor based 
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on an optimality criterion, and adds it to the non-linear 
regression ensemble. The main body of our method, pre- 
sented in Figure 1, is based on Gradient Boosting Machine 
[38] with a squared error loss function. First, ENNET ini- 
tializes fo to be an optimal constant model, without select- 
ing any transcription factor. In other words,yb is initialized 
to an average of Y^. At each next t-th step the algorithm 
creates an updated model f t , by fitting a base learner ht 
and adding it to the previous model f t -\. The base learner 



is fitted to a sample of pseudo residuals, with respect to 
a sample of transcription factors, and thus is expected 
to reduce the error of the model. Pseudo-residuals are 
re-calculated at the beginning of each iteration with 
respect to the current approximation f t . As a base learner, 
we use regression stumps, which select a single TF that 
best fits pseudo residuals. A regression stump h t (x) par- 
titions the expression values x of a candidate TF into two 
disjoint regions R\ t and R<in where R<it = M — R\u and 



Require: 

expression of TFs: x G !R iYxP , expression of target gene: y G M. N , 
number of iterations: T G N, shrinkage factor: v G [0, 1], 
sampling rate for observations: s s G [0, 1], 
sampling rate for TFs: Sf G [0,1]. 
Ensure: importance of TFs: I 2 G IR P 



1. Initialize model with arithmetic mean of y, set impor- 
tance I 2 to 0, and set step number to 1: 
fo(x) ^y, 
t <- 1, I 2 <- 0. 




10. Output scaled I 2 
over all trees: 



3. Calculate pseudo-residuals: 
ru <- Vi ~ ft-i(xi), i G {1, N}. 



4. Randomly select s s • N observations with replacement, 
and Sf • P TFs without replacement. 



5. Find regions Ri tl R 2 t, and relative importance i 2 of 
(p-th TF by training one-level regression tree on pseudo- 
residuals using only randomly selected observations and 
TFs: 

{if, (f tj R lt , R 2t } <- train_tree(^, r it ). 



6. Calculate 7^, 72^ using only randomly selected observa- 
tions and define the regresion stump h t : 

. J2i yi-ft-i(xi) c p 

^~ card{x z £R lt } ' Xi ^ jfX ^' 

™ ^_ Ei yi-ft-i(xi) c p 

72t <- card^Gflit} ' X i G 

h(x) = JiJ(x G Ru) + 72t^O G i?2t) 



l 


7. Update model using all observations: 
ft(x) <- /t-i(x) + 


j 


1 


8. Update importance of y?-th TF: 

3. 


j 



Figure 1 The flowchart of the ENNET algorithm. ENNET algorithm is a modification of a Gradient Boosting Machine algorithm, with a squared 
error loss function and a regression stump base learner. The algorithm calculates a vector of importance scores of transcription factors, which can 
possibly regulate a target gene. It is invoked P times in a problem of inferring a P-gene network, i.e., a P-column adjacency matrix V. 
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returns values y\ t and yi t) respectively, for those regions, 
as shown in Equation 2, 

h t (x) = yi t I(x e Rit) + Y2 t I(x e R 2t ), (2) 

where / is the identity function returning the numerical 
1 for the logical true, and the numerical 0 for the log- 
ical false. Regions R\ t , Rit are induced such that the 
least-squares improvement criterion is maximized: 

i 2 (Rit,R2t) = (Yit ~ Y2tf, (3) 

Wit + W2t 

where wi t , W2t are proportional to the number of obser- 
vations in regions Ri t , Ri t respectively, and yi t , y<it are 
corresponding response means. That is, yn is the average 
of the values from the vector of pseudo-residuals for those 
samples where an expression of the chosen TF falls into 
the region Ri t . The value of yit is defined in an analogous 
way. The averages yi t and yit are used as the regres- 
sion output values for regions Ri t and Ri t , respectively, as 
shown in Equation 2. The criterion in Equation 3 is eval- 
uated for each TF, and the transcription factor with the 
highest improvement is selected. In each t-th step, we only 
use a random portion of rows and columns of sam- 
pled according to the observation sampling rate s s , and the 
TF sampling rate Sf. 

The procedure outlined above creates a non-linear 
regression model of the target gene expression based on 
the expression of transcription factors. However, in the 
network inference, we are interested not in the regression 
model as a whole, but only in the selected transcription 
factors. In each t-th step of the ENNET algorithm, only 
one TF is selected as the optimal predictor. The details of 
the regression model can be used to rank the selected TFs 
by their importance. Specifically, if a transcription factor 
<Pt is selected in an iteration t, an improvement i 2 serves as 
an importance score I 2 t for that cpt-th TF. If the same TF 
is selected multiple times at different iterations, its final 
importance score is a sum of the individual scores. 

In the training of the regression model, the parameter 
v, known as a shrinkage factor, is used to scale a contri- 
bution of each tree by a factor v e (0, 1) when it is added 
to the current approximation. In other words, v controls 
the learning rate of the boosting procedure. Shrinkage 
techniques are also commonly used in neural networks. 
Smaller values of v result in a larger training risk for the 
same number of iterations T. However, it has been found 
[38] that smaller values of v reduce the test error, and 
require correspondingly larger values of T, which results 
in a higher computational overhead. There is a trade-off 
between these two parameters. 

Refining the inferred network 

Once the solutions of the independent gene selection 
problems are calculated, we compose the adjacency 



matrix V representing a graph of inferred regulatory inter- 
actions. Each of the solutions constitutes a single column- 
vector, therefore we obtain the adjacency matrix V by 
binding all the partial solutions column-wise. Then we 
apply a re-evaluation algorithm to achieve an improved 
final result. The first step does not require any addi- 
tional data to operate other than the previously calculated 
adjacency matrix V. It exploits the variance of edge prob- 
abilities in the rows of V, i.e., edges outgoing from a 
single transcription factor, as a measure of the effect of 
transcriptional regulation. We score transcription factors 
based on their effects on multiple targets. We assume that 
the effect of transcriptional regulation on a directly regu- 
lated transcript is stronger than the one of the regulation 
on indirectly regulated transcripts, e.g. transcripts reg- 
ulated through another transcription factor. Otherwise, 
knocking out a single gene in a strongly connected compo- 
nent in a network of regulatory interactions would cause 
the same rate of perturbation of the expression level of all 
the transcripts in that component. As a measure of that 
effect we use previously a calculated adjacency matrix V 
and multiply each row of V matrix by its variance of. An 
updated adjacency matrix V 1 is given by Equation 4: 

V(i,/) : v\, = of ■ ViJ , (4) 

where of is a variance in the i-th row of V. Note that V 
matrix is built column-wise, i.e., a single column of V con- 
tains the relative importance scores of all the transcription 
factors averaged over all the base learners with respect to 
a single target transcript. On the other hand, rows of V 
matrix are calculated independently in different subprob- 
lems of the proposed inference method. Each row of V 
contains relative importance scores with respect to a dif- 
ferent target transcript. We reason that if a transcription 
factor regulates many target transcripts, e.g. a transcrip- 
tion factor is a hub node, the variance in a row of V 
corresponding to that transcription factor is elevated and 
therefore it indicates an important transcription factor. 

The second step of refining the network requires knock- 
out expression data. We reason that direct regulation of 
a transcript by a transcription factor would lead to a dis- 
tinct signature in the expression data if the transcription 
factor was knocked out. A similar reasoning gave founda- 
tions for the null-mutant z-score method [15] of reverse- 
engineering GRNs. However, in the proposed method this 
step is only applied if knockout expression profiles are 
available. In this step we calculate an adjacency matrix V 2 , 
which is an update to an already derived adjacency matrix 
V 1 , as shown in Equation 5: 

«(/) = {r : k Ti * 0}, p(t) = {r : k rJ = 0}, (5) 
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where e a (/y * s an avera g e expression value of the j-th 
transcript in all the experiments a(i) in which the i-th 
gene was knocked-out, as defined by K matrix, ^(/y is 
the mean expression value for that transcript across all 
the other knockout experiments, /?(/), and Oj is the stan- 
dard deviation of the expression value of that transcript in 
all the knockout experiments. The | ea ®>i e PW \ coefficient 
shows how many standard deviations the typical expres- 
sion of the j-th transcript was different from the average 
expression in the experiment in which its potential i-th 
transcription factor was knocked-out. 

Performance evaluation 

A considerable attention has been devoted in recent years 
to the problem of evaluating performance of the infer- 
ence methods on adequate benchmarks [35,39]. The most 
popular benchmarks are derived from well-studied in vivo 
networks of model organisms, such as E. coli [40] and S. 
cerevisiae [41], as well as artificially simulated in silico net- 
works [39,42-45]. The main disadvantage of in vivo bench- 
mark networks is the fact that experimentally confirmed 
pathways can never be assumed complete, regardless of 
how well the model organism is studied. Such networks 
are assembled from known transcriptional interactions 
with strong experimental support. As a consequence, gold 
standard networks are expected to have few false posi- 
tives. However, they contain only a subset of the true inter- 
actions, i.e., they are likely to contain many false negatives. 
For this reason, artificially simulated in silico networks 
are most commonly used to evaluate network inference 
methods. Simulators [39] mimic real biological systems 
in terms of topological properties observed in biological 
in vivo networks, such as modularity [46] and occur- 
rences of network motifs [47]. They are also endowed 
with dynamical models of a transcriptional regulation, 
thanks to the use of non-linear differential equations and 
other approaches [42,48,49], and consider both transcrip- 
tion and translation processes in their dynamical models 
[48-50] using a thermodynamic approach. Expression data 
can be generated deterministically or stochastically and 
experimental noise, such as the one observed in microar- 
rays, can be added [51]. 

Here, we used several popular benchmark GRNs to eval- 
uate the accuracy of our proposed algorithm and compare 
it with the other inference methods. The data sets we used 
come from Dialogue for Reverse Engineering Assessments 
and Methods (DREAM) challenges and are summarized 
in Table 1. We evaluated the accuracy of the methods 
using the Overall Score metric proposed by the authors of 
DREAM challenges [35], as shown in Equation 6: 



Overall Score = - - • log 10 (p aupr • /? auroc ), (6) 



where j? aupr and ^ auroc are geometric means of p-values of 
networks constituting each DREAM challenge, relating to 
an area under the Precision-Recall curve (AUPR) and an 
area under the ROC curve (AUROC), respectively. 

Results and discussion 

We assessed the performance of the proposed inference 
algorithm on large, universally recognized benchmark 
networks of 100 and more genes, and compared it to the 
state-of-the-art methods. We summarize the results of 
running different inference methods in Figure 2. For a 
comparison we selected a range of established methods 
from literature: ARACNE, CLR, and MRNET as imple- 
mented in the minet R package [52], GENIE3 and C3NET 
as implemented by their respective authors, our pre- 
viously reported method ADANET, and the top three 
performers in each of the three DREAM challenges as 
listed on the DREAM web site. Some of the methods 
were designed for use with knockout data, while others 
are developed with multifactorial data in mind, where no 
information is given about the nature of the perturba- 
tions. Therefore, depending on the nature of the particular 
DREAM data set, only the suitable group of methods is 
used for the comparison. 

The accuracy of ENNET 

DREAM3 [15,53,54] features in silico networks and 
expression data simulated using GeneNetWeaver soft- 
ware. Benchmark networks were derived as subnetworks 
of a system of regulatory interactions from known model 
organisms: E. coli and S. cerevisiae. In this study we focus 
on a DREAM3 size 100 subchallenge, as the largest of 
DREAM3 suite. The results of all the competing meth- 
ods except those that are aimed at multifactorial problems 
are summarized in Table 2. ENNET and Yip et al. meth- 
ods achieved the best Overall Scores for that subchallenge, 
as well as the best scores for all the individual networks. 
However, it is believed from the analysis of the later chal- 
lenges [39] that Yip et al. method made a strong assump- 
tion on the Gaussian type of a measurement noise, which 
was used in DREAM3, but was no longer used in later 
DREAM challenges. For example, in DREAM4 challenge 
Yip et al. method was ranked 7th. 

DREAM4 challenge [15,53,54] was posted one year after 
DREAM3 challenge. It features two large subchallenges: 
DREAM4 size 100, and DREAM4 size 100 multifactorial. 
For each subchallenge, the topology of the benchmark 
networks were derived from the transcriptional regula- 
tory system of E. coli and S. cerevisiae. In DREAM4 
size 100 subchallenge all the data types listed in Table 1 
were available except multifactorial, therefore ADANET, 
GENIE3, CLR, C3NET, MRNET, and ARACNE methods 
were excluded from the comparison. The results of all 
the methods are summarized in Table 3. ENNET method 
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DREAM 3 
DREAM 4 size 100 
DREAM 4 size 100 MF 
DREAM 5 



ENNET 

State-of-the-art methods 
Winner of the challenge 




"D 
O 

H — 1 

CD 



ENNET 
ADANET 
GENIE3 
C3NET 
CLR 
MRNET 
ARACNE 
1ST PLACE 
2ND PLACE 
3RD PLACE 



Overall Score 

Figure 2 The Overall Score of GRN inference methods by data set. Results of the different inference methods on DREAM challenges. Results of 
the state-of-the-art methods were collected after running the algorithms with the default sets of parameters on pre-processed data. Results in the 
"Winner of the challenge" part of the figure correspond to the best methods participating in the challenge. 




clearly outperformed all the others and achieved consis- 
tently high scores across all the benchmark networks. In 
the second DREAM4 large subchallenge, DREAM4 size 
100 multifactorial, only multifactorial data were available, 
therefore all the methods were included in the compari- 
son, and run as originally designed. The results of all the 
methods are summarized in Table 4. ENNET achieved the 
best Overall Score. 

Three benchmark networks in DREAM5 [35] were dif- 
ferent in size, and structured with respect to different 
model organisms. However, this time expression data of 
the only one network were simulated in silico, the two 
other sets of expression data were measured in real exper- 
iments in vivo. Like in all DREAM challenges, in silico 



expression data were simulated using an open-source 
GeneNetWeaver simulator [54]. However, DREAM5 was 
the first challenge where participants were asked to infer 
GRNs on a genomic scale, e.g. for thousands of tar- 
get genes, and hundreds of known transcription factors. 
Gold standard networks were obtained from two sources: 
RegulonDB database [40], and Gene Ontology (GO) anno- 
tations [55]. The results of all the inference methods for 
DREAM5 expression data are summarized in Table 5. 
ENNET achieved the best score for the in silico network, 
and the best Overall Score, as well as the best individ- 
ual AUROC scores for all the networks. Clearly all the 
participating methods achieved better scores for an in 
silico network than for either one of in vivo networks. 



Table 2 Results of the different inference methods on DREAM3 networks, challenge size 100 



Method 








Network (AUPR/AUROC respectively) 








Overall 


1 






2 


3 


4 






5 


Experimental results 


ENNET 


0.627 


0.901 


0.865 


0.963 


0.568 0.892 


0.522 


0.842 


0.384 


0.765 


>300 


Winner of the challenge 


Yipetal. 


0.694 


0.948 


0.806 


0.960 


0.493 0.915 


0.469 


0.856 


0.433 


0.783 


>300 


2nd 


0.209 


0.854 


0.249 


0.845 


0.184 0.783 


0.192 


0.750 


0.161 


0.667 


45.443 


3nd 


0.132 


0.835 


0.154 


0.879 


0.189 0.839 


0.179 


0.738 


0.164 


0.667 


42.240 



Results of the different inference methods on DREAM3 networks, challenge size 1 00. An area under the ROC curve (AUROC) and an area under the Precision-Recall 
curve (AUPR) are given for each network respectively. The overall Score for all the networks is given in the last column. The best results for each column are in bold. 
Numbers in the "Experimental results" part of the table were collected after running the algorithms with the default sets of parameters on pre-processed data. 
However, ADANET, GENIE3, CLR, C3NET, MRNET, and ARACNE methods, as they are originally defined, take a multifactorial matrix as an input, which is unavailable in 
this challenge. Therefore they were excluded from the comparison. Numbers in the "Winner of the challenge" part of the table correspond to the best methods 
participating in the challenge. 
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Table 3 Results of the different inference methods on DREAM4 networks, challenge size 100 



Method 








Network (AUPR/AUROC respectively) 








Overall 


1 




2 




3 




4 


5 




Experimental results 


EN NET 


0.604 


0.893 


0.456 


0.856 


0.421 0.865 


0.506 


0.878 


0.264 


0.828 


87.738 


Winner of the challenge 


Pinna et al. 


0.536 


0.914 


0.377 


0.801 


0.390 0.833 


0.349 


0.842 


0.213 


0.759 


71.589 


2nd 


0.512 


0.908 


0.396 


0.797 


0.380 0.829 


0.372 


0.844 


0.178 


0.763 


71.297 


3rd 


0.490 


0.870 


0.327 


0.773 


0.326 0.844 


0.400 


0.827 


0.159 


0.758 


64.715 



Results of the different inference methods on DREAM4 networks, challenge size 100. An area under the ROC curve (AUROC) and an area under the Precision-Recall 
curve (AUPR) are given for each network respectively. The Overall Score for all the networks is given in the last column. The best results for each column are in bold. 
Numbers in the "Experimental results" part of the table were collected after running the algorithms with the default sets of parameters on pre-processed data. 
However, ADANET, GENIE3, CLR, C3NET, MRNET, and ARACNE methods, as they are originally defined, take a multifactorial matrix as an input, which is unavailable in 
this challenge. Therefore they were excluded from the comparison. Numbers in the "Winner of the challenge" part of the table correspond to the best methods 
participating in the challenge. 



ENNET shows better in vivo results than the other meth- 
ods in terms of an area under the the ROC curve. Still, 
predictions for in vivo expression profiles show a low 
overall accuracy. One of the reasons for a poor per- 
formance of the inference methods for such expression 
profiles is a fact that experimentally confirmed pathways, 
and consequently gold standards derived from them, can- 
not be assumed complete, regardless of how well is a 
model organism known. Additionally, there are regula- 
tors of gene expression other than transcription factors, 
such as miRNA, and siRNA. As shown in this study, in 
silico expression profiles provide enough information to 
confidently reverse-engineer their underlying structure, 
whereas in vivo data hide a much more complex system of 
regulatory interactions. 



Computational complexity of ENNET 

Computational complexity of ENNET depends mainly on 
the computational complexity of the regression stump 
base learner, which is used in the main loop of the algo- 
rithm. As shown in Figure 1, we call the regression stump 
algorithm T times for each k-th target gene, k e {1, 
Given a sorted input, a regression stump is 0(PN) com- 
plex. We sort the expression matrix in an 0(PN log N) 
time. All the other instructions in the main loop of 
ENNET are at most 0(N). The computational complex- 
ity of the whole method is thus 0(PN\ogN + TP 2 N + 
TPN). Because, in practice, the dominating part of the 
sum is TP 2 N, we report a final computational complex- 
ity of ENNET as 0(TP 2 N), and compare it to the other 
inference methods in Table 6. Note that the measure for 



Table 4 Results of the different inference methods on DREAM4 networks, challenge size 100 multifactorial 



Method 








Network (AUPR/AUROC respectively) 








Overall 




1 




2 


3 






4 




5 


Experimental results 


ENNET 


0.184 


0.731 


0.261 


0.807 


0.289 


0.813 


0.291 


0.822 


0.286 


0.829 


52.839 


ADANET 


0.149 


0.664 


0.094 


0.605 


0.191 


0.703 


0.172 


0.712 


0.182 


0.694 


24.970 


GENIE3 


0.158 


0.747 


0.154 


0.726 


0.232 


0.777 


0.210 


0.795 


0.204 


0.792 


37.669 


C3NET 


0.077 


0.562 


0.095 


0.588 


0.126 


0.621 


0.113 


0.687 


0.110 


0.607 


15.015 


CLR 


0.142 


0.695 


0.118 


0.700 


0.178 


0.746 


0.174 


0.748 


0.174 


0.722 


28.806 


MRNET 


0.138 


0.679 


0.128 


0.698 


0.204 


0.755 


0.178 


0.748 


0.187 


0.725 


30.259 


ARACNE 


0.123 


0.606 


0.102 


0.603 


0.192 


0.686 


0.159 


0.713 


0.166 


0.659 


22.744 


Winner of the challenge 


GENIE3 


0.154 


0.745 


0.155 


0.733 


0.231 


0.775 


0.208 


0.791 


0.197 


0.798 


37.428 


2nd 


0.108 


0.739 


0.147 


0.694 


0.185 


0.748 


0.161 


0.736 


0.111 


0.745 


28.165 


3rd 


0.140 


0.658 


0.098 


0.626 


0.215 


0.717 


0.201 


0.693 


0.194 


0.719 


27.053 



Results of the different inference methods on DREAM4 networks, challenge size 100 multifactorial. An area under the ROC curve (AUROC) and an area under the 
Precision-Recall curve (AUPR) are given for each network respectively. The Overall Score for all the networks is given in the last column. The best results for each 
column are in bold. Numbers in the "Experimental results" part of the table were collected after running the algorithms with the default sets of parameters on 
pre-processed data. Numbers in the "Winner of competition" part of the table correspond to the best methods participating in the challenge. 
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Table 5 Results of the different inference methods on DREAM 5 networks 

Network (AUPR/AUROC respectively) 

Method Overall 

1 3 4 

Experimental results 

ENNET 0.432 0.867 0.069 0.642 0.021 0.532 >300 



ADANET 0.261 0.725 0.083 0.596 0.021 0.517 16.006 



GENIE3 


0.291 


0.814 


0.094 


0.619 


0.021 


0.517 


40.335 


C3NET 


0.080 


0.529 


0.026 


0.506 


0.018 


0.501 


0.000 


CLR 


0.217 


0.666 


0.050 


0.538 


0.019 


0.505 


4.928 


MRNET 


0.194 


0.668 


0.041 


0.525 


0.018 


0.501 


2.534 



ARACNE 0.099 0.545 0.029 0.512 0.017 0.500 0.000 



Winner of the challenge 



GENIE3 


0.291 


0.815 


0.093 


0.617 


0.021 


0.518 


40.279 


ANOVA y] 2 


0.245 


0.780 


0.119 


0.671 


0.022 


0.519 


34.023 


TIGRESS 


0.301 


0.782 


0.069 


0.595 


0.020 


0.517 


31.099 



Results of the different inference methods on DREAM5 networks. An area under the ROC curve (AUROC) and an area under the Precision-Recall curve (AUPR) are given 
for each network respectively. The Overall Score for all the networks is given in the last column. The best results for each column are in bold. Numbers in the 
"Experimental results" part of the table were collected after running the algorithms with the default sets of parameters on pre-processed data. Numbers in the 
"Winner of the challenge" part of the table correspond to the best methods participating in the challenge. 



the information-theoretic methods: CLR, MRNET, and 
ARACNE does not include a calculation of the mutual 
information matrix. 

When implementing ENNET algorithm we took advan- 
tage of the fact that gene selection problems are indepen- 
dent of each other. Our implementation of the algorithm 
is able to calculate them in parallel if multiple process- 
ing units are available. User can choose from variety of 
parallel backends including multicore package for a single 
computer and parallelization based on Message Passing 
Interface for a cluster of computers. The biggest data we 
provided as input in our tests were in vivo expression pro- 
files of S. cerevisiae from the DREAM 5 challenge. These 
are genome-wide expression profiles of 5950 genes (333 



Table 6 The computational complexity of ENNET and the 
other GRN inference methods 



Method 


Complexity 


ENNET 


0(TP 2 N), T = 5000 


ADANET 


0(CTP 2 N),C = 30,7= TV^l 


GENIE3 


OiTKPN log N),T=] 000, K = [VP] 


C3NET 


0(P 2 ) 


CLR 


0(P 2 ) 


MRNET 


0(fP 2 ),f e[1,P] 


ARACNE 


0(P 3 ) 



The computational complexity of ENNET and the other GRN inference methods 
with respect to the number of genes P and the number of samples N. The 
computational complexity of CLR, MRNET, and ARACNE is given without 
calculating the Mutual Information matrix. 



of them are known transcription factors) measured in 536 
experiments. It took 113 minutes and 30 seconds to cal- 
culate the network on a standard desktop workstation 
with one Intel®Core™i7-870 processor with 4 cores and two 
threads per core (in total 8 logical processors) and 16 GB 
RAM. However, it took only 16 minutes and 40 seconds to 
calculate the same network on a machine with four AMD 
Opteron™6282 SE processors, each with 8 cores and two 
threads per core (in total 64 logical processors) and 256 
GB RAM. All the data sets from the DREAM 3 and the 
DREAM 4 challenges were considerably smaller, up to 100 
genes. It took less than one minute to calculate each of 
these networks on a desktop machine. 

Setting parameters of ENNET 

The ENNET algorithm is controlled by four parameters: 
the two sampling rates s s and sj, the number of iterations 
T and the learning rate v. The sampling rate of samples 
s s and the sampling rate of transcription factors Sf gov- 
ern the level of randomness when selecting, respectively, 
rows and columns of the expression matrix to fit a regres- 
sion model. The default choice of the value of s s is 1, i.e., 
we select with replacement a bootstrap sample of obser- 
vations of the same size as an original training set at each 
iteration. Because some observations are selected more 
than once, around 0.37 of random training samples are out 
of bag in each iteration. It is more difficult to choose an 
optimal value of Sf, which governs how many transcrip- 
tion factors are used to fit each base learner. Setting this 
parameter to a low value forces ENNET to score transcrip- 
tion factors, even if their improvement criterion, as shown 
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in Equation 2, would not have promoted them in a pure 
greedy search, i.e., Sf = 1. However, if a chance of selecting 
a true transcription factor as a feature is too low, ENNET 
will suffer from selecting random genes as true regulators. 

Even though reverse-engineering of GRNs does not 
explicitly target a problem of predicting gene expres- 
sion, we choose the values of sampling rates such that 
the squared-error loss of a prediction of the target gene 
expression as given byfr (see Figure 1) is minimal. This 
is done without looking at the ground truth of regula- 
tory connections. For each benchmark challenge we per- 
formed a grid search over (s st Sf) e {0.1,0.3,0.5,0.7, 1} x 
{0.1,0.3,0.5,0.7,1} with fixed v = 0.001, T = 5000. 
For each specific set of parameters we analyzed an aver- 
age 5-fold cross-validated loss over all the observations 
(across all gene selection problems). We further ana- 
lyze our approach with respect to one of the challenges: 
DREAM4 size 100, as shown in Figure 3. The minimal 
average loss was achieved for s s = 1 and sj = 0.3 (see 
Figure 3 A), which is consistent with the default param- 
eters proposed for Random Forest algorithm [28]. We 
also compared the measure based on an average loss with 
the Overall Score as defined by Equation 6. The results 
were consistent across the two measures, i.e., a selec- 
tion of parameters that gave a low average loss also led 
to the accurate network predictions (see Figure 3 B). An 
advantage of the average loss measure is a fact that the 
gold standard network is not used to tune parameters. 

In Figure 4 we present a detailed analysis of the accu- 
racy of the GRN inference across different networks of 
the DREAM4 size 100 challenge. Each point on both 



Figure 4 A and Figure 4 B is a result of running ENNET 
with different parameters: (s s ,Sf) e {0.1,0.3,0.5,0.7,1} x 
{0.1,0.3,0.5,0.7, 1} with fixed v = 0.001, T = 5000. The 
highlighted points are corresponding to s s = 1, Sf = 0.3, 
v = 0.001, T = 5000. An area under the Precision- 
Recall curve and an area under the ROC curve are two 
different measures of the accuracy of an inferred network, 
which are well preserved across the five networks: for each 
separate network we observe that AUPR and AUROC 
decreases in a function of an average loss. As the Overall 
Score is closely related to AUPR and AUROC, the results 
shown in Figure 4 explain the shape of a surface shown in 
Figure 3. 

As ENNET uses boosting, it needs a careful tuning of 
the number of iterations T and the learning rate v. It has 
been shown [38] that parameters T and v are closely cou- 
pled. Usually the best prediction results are achieved when 
v is fixed to a small positive number, e.g. v < 0.001, 
and the optimal value of TY is found in a process of 
cross-validation. As described above, we reason that the 
choice of parameters, which gives a low average loss on 
a cross-validated test set, leads to an accurate network 
prediction. Therefore in Figure 5 we present how an aver- 
age loss depends on T e {1, 5000} for different values 
of v e {0.001,0.005,0.01,0.05,0.1}, with fixed s s = 1, 
sj = 0.3. Each of the line shows how much ENNET over- 
trains the data for a given T and v. Finally, the optimal 
choice of parameters for DREAM4 size 100 challenge is 
s s = l, Sf = 0.3, T = 5000, v = 0.001. Following the same 
practice, we used this default set of parameters: s s = 1, 
Sf = 0.3, T = 5000, v = 0.001 to evaluate ENNET 




Figure 3 The analysis of the sampling rates s 5 and Sf for DREAM 4 size 1 00 challenge: the Overall Score and a loss. The analysis of the 
sampling rates s s and s f for the DREAM 4 size 1 00 challenge. A: For each set of parameters (s s , s f , M, v) e {0.1 , 0.3, 0.5, 0.7, 1 } x {0.1 , 0.3, 0.5, 0.7, 1 } x 
{5000} x {0.001 } we analyzed an average 5-fold cross-validated loss over all the observations (across all gene selection problems) from all 5 
networks. The minimal average loss was achieved for high values of s s = 1 and low values of Sf = 0.3. B: We also compared the measure based on 
an average loss with the original Overall Score, as proposed by the authors of the DREAM challenge. The results were consistent across the two 
measures, i.e., the parameters that gave low average loss also led to accurate network predictions (a high Overall Score). 
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Figure 4 The analysis of the sampling rates s s and Sf for DREAM 4 size 100 challenge: AUPR, AUROC, and a loss. The analysis of the sampling 
rates s s and s f for DREAM 4 size 1 00 challenge. A: For each set of parameters (s s , s fl M, v) e {0.1 , 0.3, 0.5, 0.7, 1 } x {0.1 , 0.3, 0.5, 0.7, 1 } x {5000} x {0.001 } 
we analyzed an area under the Precision-Recall curve (AUPR) in function of an average 5-fold cross-validated loss over all the observations (across all 
gene selection problems) from all 5 networks. For each network AUPR is decreasing in a function of a loss. For each network a point corresponding 
to the default set of parameters is highlighted, i.e., (s Sl Sf,M, v) = (1,0.3,5000,0.001) . Usually, the default set of parameters gives the minimal loss 
(maximal AUPR). B: By analogy, different choices of parameters lead to a different area under the ROC curve (AUROC). The two measures are 
consistent with each other. 



algorithm on all the benchmark networks using ground 
truth, i.e., for calculating the Overall Score and comparing 
it to the other algorithms. 

Stability of EN NET 

Because ENNET uses random sampling of samples and 
features at each iteration of the main loop, as shown in 
Figure 1, it may calculate two different networks for two 
different executions on the same expression data. With 
the default choice of parameters, i.e., s s = 1, Sf = 0.3, 
T = 5000, v = 0.001, we expect numerous random 
resamplings, and therefore we need to know if a GRN 
calculated by ENNET is stable between different execu- 
tions. We applied ENNET to the 5 networks that form 
DREAM 4 size 100 benchmark, repeating the inference 
calculations independently ten times for each network. 



Then, for each network, we calculated a Spearman's rank 
correlation between all pairs among the ten independent 
runs. The lowest correlation coefficient we obtained was 
p > 0.975, with p-vdlue < 2.2e — 16, indicating that 
the networks that result from independent runs are very 
similar. This proves that ENNET, despite being a random- 
ized algorithm, finds a stable solution to the inference 
problem. 

Conclusions 

We have proposed the ENNET algorithm for reverse- 
engineering of Gene Regulatory Networks. ENNET uses 
a variety of types of expression data as an input, and 
shows robust performance across different benchmark 
networks. Moreover, it does not assume any specific 
model of a regulatory interaction and do not require 



Average loss in function of T 

shrinkage factor 

0.1 0.05 0.01 0.005 0.001 




1 1000 2000 3000 4000 5000 

T 



Figure 5 The analysis of the number of iterations T and the shrinkage v for the DREAM 4 size 100 challenge. The analysis of the number of 
iterations T and the shrinkage factor v for DREAM 4 size 1 00 challenge. These two parameters are closely coupled: the lower is the shrinkage 
parameter v, the more iterations T are needed to train the model such that it achieves the minimal loss. 



Stawek and Arodz BMC Systems Biology 201 3, 7:1 06 
http://www.biomedcentral.eom/1 752-0509/7/1 06 



Page 12 of 13 



fine-tuning of its parameters, i.e., we define the default 
set of parameters, which promises accurate predictions 
for the future networks. Nevertheless, together with the 
algorithm, we propose a procedure of tuning parameters 
of ENNET towards minimizing empirical loss. Processing 
genome-scale expression profiles is feasible with ENNET: 
including up to a few hundred transcription factors, and 
up to a few thousand regulated genes. As shown in this 
study, the proposed method compares favorably to the 
state-of-the-art algorithms on the universally recognized 
benchmark data sets. 
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